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Abstract: HIV dynamical models are often based on non-linear sys- 
tems of ordinary differential equations (ODE), which do not have analytical 
solution. Introducing random effects in such models leads to very challeng- 
ing non-linear mixed-effects models. To avoid the numerical computation 
of multiple integrals involved in the likelihood, we propose a hierarchical 
likelihood (h-likelihood) approach, treated in the spirit of a penalized like- 
lihood. We give the asymptotic distribution of the maximum h-likelihood 
estimators (MHLE) for fixed effects, a result that may be relevant in a more 
general setting. The MHLE are slightly biased but the bias can be made 
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negligible by using a parametric bootstrap procedure. We propose an effi- 
cient algorithm for maximizing the h-likelihood. A simulation study, based 
on a classical HIV dynamical model, confirms the good properties of the 
MHLE. We apply it to the analysis of a clinical trial. 

Keywords: algorithm, asymptotic, differential equations, h-likelihood, 
HIV dynamics models, non-linear mixed effects model, penalized likelihood. 



1 INTRODUCTION 

Since the influential paper of Ho et al. (1995) there has been a strong 
impetus to develop mathematical models for better understanding the in- 
teraction between HIV and the immune system; see Nowak and May (2000). 
However the statistical inference in these models has raised major challenges 
coming from the intrication of identifiability and numerical problems. The 
first problem is numerical: in general the trajectories of the interesting 
quantities (e.g. viral load or CD4 counts) are solutions of non-linear differ- 
ential equations that do not have analytical solutions. The second problem 
is the identifiability problem: the observations recorded on one subject are 
not informative enough to estimate all the parameters of the model. The 
first problem is either avoided, simplifying the models to obtain analyti- 
cal solutions (Wu and Ding, 1999), or solved by using numerical solvers of 
ordinary differential equations (ODE); Ramsay et al. (2007) proposed an 
original approach but did not apply it to a random effect model. The second 
problem is partly treated by considering that the particular values of the pa- 
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rameters for each subject are realizations of random variables with a given 
distribution in the population. This puts the problem in the framework of 
non-linear mixed effects models. Laplace approximation of the numerical 
integrals involved in the computation of the likelihood has been proposed 
(Beal and Sheiner, 1982; Lindstrom and Bates, 1990); adaptive Gaussian 
quadrature is another possibility (see Davidian and Giltinan, 1995). We re- 
fer to Wu (2005) for a review of statistical issues in HIV models. Recently a 
stochastic approximation EM (SAEM) algorithm has been proposed (Kuhn 
and Lavielle, 2005; Donnet and Samson, 2007). In the specific case of HIV 
dynamics models a Bayesian approach has been proposed by Putter et al. 
(2002) and Huang, Liu and Wu (2006), while a special algorithm for com- 
puting the likelihood and maximizing using a Newton-like method has been 
proposed by Guedj, Thiebaut and Commenges (2007). However all these 
methods present difficulties and can be time-consuming. 

The hierarchical likelihood (h-likelihood) has been proposed for general- 
ized linear models with random effects by Lee and Nelder (1996) and further 
studied in Lee and Nelder (2001) and Lee, Pawitan and Nelder (2006) and 
for non-linear mixed effects models by Noh and Lee (2008). This is very 
similar to an approach called penalized likelihood used by McGilChrist and 
Aisbett (1991) and Therneau and Grambsch (2000) for frailty models. The 
main idea is to treat the random effects (or the frailties) as parameters 
and to find estimates of all the parameters by maximizing a function which 
is essentially the loglikelihood conditional on the random effects minus a 
penalty term which takes large values if the "random" parameters are very 
dispersed. Penalized likelihood has also been used for function estimation 
(O'Sullivan; 1988). The advantage of this approach is that it may avoid 
computing numerical integrals. The curse of dimensionality is transferred 
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from the dimension of numerical integrals to the dimension of the space on 
which maximization takes place. There are problems with this approach. 
One is the asymptotic distribution of the estimators of the fixed parame- 
ters; another is the estimation of the variances of the random parameters. 
Consistency of the maximum h-likelihood estimators (MHLE) has not been 
proved. It is often suggested to revert to the likelihood to have consistent 
estimators of the fixed parameters, but then the most important benefits 
of h-likelihood in terms of computational burden is lost. Last but not least 
is the problem of maximizing a complicated function over several hundred 
parameters. 

The aim of this paper is to develop a (partly non-standard) h-likelihood 
approach to HIV dynamics models which completely avoids computation of 
the likelihood. This is in the spirit of penalized likelihood in the sense that 
we do not try to precisely estimate the variances of the random effects. One 
aim is to study the asymptotic distribution of the MHLE for a given choice 
of the penalty. Another aim is to find an efficient maximization algorithm. 

The paper is organized as follows. In section 2 we describe a statistical 
model based on an ODE system in a general form and in a particular form 
which will be used for simulations. In section 3 we describe h-likelihood and 
we give the asymptotic distribution of the MHLE for fixed effects when the 
number of subjects tends toward infinity. We propose a parametric boot- 
strap procedure to correct the bias of the MHLE. In section 4 we propose 
a strategy for choosing the penalty based on the guess of an upper bound 
of the variance of the random effects. An efficient maximization algorithm 
is presented in section 5. Section 6 presents a simulation study. Section 7 
presents the analysis of a clinical trial. We conclude in section 8. 
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2 A POPULATION DYNAMICS MODEL 



2.1 A general model for the system 

The dynamics of the concentrations of virions and CD4+ T-cells (in short, 
CD4) in different stages (represented by X l (t)) can be described by an 
ODE system. We allow the values of the parameters to vary between sub- 
jects; thus we consider a population model, as in Guedj, Thiebaut and 
Commenges (2007). For subject i with i — 1, ...n, this can be written: 

X\0) = h(?) 

where X l (t) = (X[(t), X l K (t))' is the vector of the K state variables 
(or components); = (£i, •••,££) is a vector of p individual parameters 
which appear naturally in the ODE system and have generally a biological 
interpretation. Similarly to generalized (mixed) linear models, we introduce 
a link function which relates £ l to a linear model involving explanatory 
variables and random effects: 

f (pt + bj + zf(t)(3 l , l = l,...,R, 
[ <p l + zl(t)f3 l , l = R+l,...,p, 

where (pi is the intercept, z\(t) are vectors of explanatory variables associ- 
ated with the fixed effects of the Zth biological parameter; these explana- 
tory variables may be time-dependent, in which case the ODE system has 
time- dependent parameters. The /3j's are vectors of regression coefficients; 
bi = (b{, . . . ,b l R ) is the individual vector of random effects. We assume 
bi ~ jV(0, S) with S diagonal with diagonal elements rf. More general 
models could of course be considered. 
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2.2 Model for the observations 

Let Yijm denote the jth measurement of the rath observable component for 
subject i at time ^ Jm ; we assume that: 

Yij m 9m(-X- {tijrnj) ^ijmi i 1, 71, j 1, Tii m) (3) 

for m = 1,...,M, where g m {.) are known functions and where the e^ m 
are independent Gaussian variables with zero mean and variances a^. The 
e^mS are supposed independent because they represent measurement errors. 
The model for the observations may be complicated by the detection limits 
of assays leading to left-censored observations Yij m . 

2.3 A particular model for HIV dynamics 

For illustrating the proposed method we present a version of a rather stan- 
dard model for the HIV dynamics model, close to that used by Nowak and 
Bangham (1996): 



dT l 

dt 

dT*' 1 

— = lT V — n T „T 

dV i .... 

— = vrT -pyV 



where T l , T* 1 represent the concentrations (implicitly depending on t) of 
non-infected and infected CD4 respectively, and V 1 stands for the concen- 
tration of virus. 

Here the components of £,\ = (\\ 7*, /i^, /i^*, ir\ fiy) represent rates of 
events such as production of new cells or particles, rates of infection after 
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meeting between different particles. As for the we will take the nat- 

ural log-transform for all the parameters: the natural log-transform can be 
justified if we think of the parameters as expectations of Poisson variables 
and has the advantage that the standard deviations of the transformed pa- 
rameters may be interpreted as coefficients of variations of the estimators 
of the natural parameters. For the simulations, in the link equation ^ we 
will take random effects for A*, n l and //y* (so R = 3) with the b\ being 
normal and independent with variances r|, t%, respectively. For all 

parameters except for 7* we will take no explanatory variable. The effect of 
the treatment will be modeled as modifying 7* according to the equation: 

f = l gf = 70 + f3tz\{t)+ fo4(t), 

where z\{t) and z\(f) are treatment indicators. The treatment may change 
with time; here we will suppose that they are fixed for t > but take 
the value for t < 0. We assume that at t = the patients are at the 
equilibrium of the system with z\(t) = z\(t) = and this gives impor- 
tant information. As for the observation equation ^ we will take in the 
simulations: 

Y i:jl = log 10 V l (tiji) + €ij\ 

Y m = [T% 2 )+T*% 2 )]V4 + %2 
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3 THE HIERARCHICAL OR PENALIZED 
LIKELIHOOD 

3.1 Asymptotic Distribution of the MHLE 

Let us consider the following model: conditionally on bi, Yi has a density 
/y(.; 9, where 9 is a vector of fixed parameters of dimension q (9 G C 
3? 9 ) and bi are random effects (or parameters) of dimension R. The (Y i: hi) 
are independently identically distributed (iid). Typically Yi is multivariate 
of dimension rij. We assume that the 6j have density /&(.; r) with zero 
expectation and where r is a vector of parameters. We denote by P* the 
true probability and 9* and r* the parameter values which specify the 
distribution of the observed Y { . Typically Y { is (at least partially) observed 
while bi is not. 

Estimators of both 9 and b = (pi, . . . ,b n ) are defined as maximizing 
the (normalized) extended loglikelihood, called here (by abuse of language) 
h-loglikelihood: 

i 1 n 

HL(6>, b, t) = L e n ' ^2 J (bf, r), 

n i=i 

where L^> is the loglikelihood (normalized by ^) for the observation con- 
ditional on b, and r) = — log r). We denote by (9 T ,b T ) the 
values which maximize HL(#, 6, r) for given r; (? T will be called the MHLE 
of the parameters 9. We have HL(0, 6, r) = ^ X)"=i hl(l^; 6 1 , b h r) with 
hl(Fj; 6 1 , b h r) = l{Yi,9,bi) — J(bi,r), where l(Yi,9,bi) is the loglikelihood 
for subject i conditional on bi. For simpler notation we will not always 
make the dependence in r explicit and will write for instance RL(9, b) for 
BL(9, b, t). We shall make the additional assumptions: 



8 



Al l(y;0,bi) and J(bi, r) are continuous and twice-continuously differ- 
ential) le functions of and foj for all y and r; 
A2 E P 4(F i; 0, 6*) exists for all 9 e 0. 

We shall derive asymptotic results for the MHLE of the fixed parameters 
0, which do not require that r = r*. 

Lemma 1 Under assumptions Al and A2 the MHLE for fixed effects are 
M- estimators. 

Proof. Consider the profile h-loglikelihood PHL„(0) = HL(0,6(0)), where 
6(0) = argmax fe HL(0,6). (0 T ,6(0 T )) maximizes PHL n (0), thus T is the 
profile h-likelihood estimator. Remembering that HL(0, 6) — - X)"=i hl^; 0, 
it is clear that the components of 6(0) are the i>i{9) = argmax b .[hl(l^; 9, bi)]. 
Thus PHL n (0) = HL(0,6(0)) = \ £™ =1 hl(^; 0, ^(0)). It follows that T 
is a M-estimator because it is clear that 9 T is the maximum of M n (9) = 
n ~ X Yh=\ 'TiflO'i), where mo{y) is a known measurable function: here mg(y) = 
h\(y;9,b(y;9)) where b(y;9) = argmax & [hl(y; 0, &)] (Van der Vaart, 1998, p 
41). 

For the convergence result we need the additional assumption: 
A3 For every sufficiently small ball U e ©, Ep» sup eg[/ hl(|/; 0, b(y; 9)) < oo. 

In the convergence theorems of the MHLE we will emphasize the fact 
that it depends on n by writing r = 0j. 

Theorem 1 If Q is compact and assumption A1-A3 holds, the MHLE of 
fixed effects 9% converges in probability toward 0J" = argmaxg Ep» [hl(Fj; 9, bi{9))], 
for any r. 

Proof. By the law of large numbers M n (9) -> p M(9) where M(0) = 
Ep. [hl(Fj; 0, bi{9))\. Let us call 0J" the value, that we assume unique, at 
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which M(9) attains its maximum. The conditions stated in the Theorem, 
together with the continuity assumption Al, allow us to apply Wald's con- 
sistency proof (van der Vaart, 1998, Theorem 5.14, p48). 

Corollary 1 The MHLE of the fixed parameter of the statistical model de- 
scribed in section^converges in probability toward = argmaxg Ep* [hl(YJ; 9, bi(9))]. 

Proof. In the case of the statistical model of section[2]we have hl(Yj; 9, bi(0)) = 
E^ =1 [-f log<-E?i! ( y «"-^**W» a ]-E r * =1 'W, where <j> m (t ijm ; 9, b t ) = 
g m (X 1 (tijm)) (where X l (tij m ) is the solution of the ODE system with pa- 
rameters 9,bi). In case where are fixed, assumption A3 is trivially 
satisfied because we can remove the terms involving cr^ and obtain a func- 
tion which is bounded by zero. If we include the cr^ in the parameters 
that we wish to estimate, assumption A3 is satisfied since h\(Yi]9,bi(9)) < 

v^M n; i 2 v^ni (^m-^fe m i^M^))) 2 <■ v^Af n; i ~ 2 „ In 

2^m=l 2 LO & a m ^j=l — ^m=l 2 10 & "m n i/ 

• + n ~2 E;il(^-^(%-^( e ))) 2 U1 + • , , U , 

with = — - — . It seems reasonable to conjecture that 

Ep. [— log (Yij m — 4>m{Ujm\ 9, bi{9))) 2 } < oo. We can compactify the space 
by taking = ft d . If some parameters take an infinite value, hl(Yi; 9, bi(9)) 
take either the value — oo or a finite value. 

Now the problem is to investigate whether 9T is equal or close to 9*. 
M (9) can be considered as an approximation of minus the Kullback-Leibler 
divergence. This is obtained by replacing the expectation in b by the mode. 
The approximation is exact if m are linear functions in b but this is not 
true in general. However there is a possibility of reducing the bias (see 
section 3.3). 

The asymptotic normal distribution holds for M-estimators under some 
regularity conditions. We make use of Theorem 5.23 of van der Vaart (1998) 
which only requires a Lipshitz condition on mg(y) = hl(y; 9, b(y; 9)) that we 
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can establish if the following assumption bearing on u e (y) = 8hl fa 
holds. We shall use uf = u 6 {Yi) and will give an alternative expression in 
formula 

A4 There is a neighborhood ©o C of 9T such that the function rh(y) = 
sup eeeo u (y) has the property: Ep. ||m(Yj) || 2 < oo. 

Theorem 2 Assume assumptions A1-A4 hold. Then y/n(9^ — dT) is asymp- 
totically normal with zero expectation and variance equal to 

not) = {E P ,[Hf]}- 1 {E P ,[ufuf T )}{E P ,[Hf}}-\ 

where Hf = 

Proof. The theorem follows by applying Theorem 5.23 of van der Vaart 
(1998). In this theorem, the main condition is that there exists a measurable 
function fn with Ep*m 2 < oo such that for every 9i and 9 2 in a neighborhood 
©o of 6*o we have: 

\m dl (y) - m d2 (y)\ < m(j/)||0i - 9 2 \\. (4) 

A Taylor series expansion gives: mg 1 — mg 2 = (6i — 02) T ^f-(O), where 
9 e O . This yields: 

\m ei -me 2 \ < ^0)^-0^ < sup - 9 2 \\. 

Then assumption A4 allows us applying the Theorem 5.23 of van der Vaart 
(1998). 

For applying Theorem [2j it remains to compute the first and second 
derivatives of mg(Yi) in terms of derivatives of the likelihood conditional 
on the random effects. We write li(9,bi(9)) = l(Yi;9,bi(9)). Let us call J| 
(resp.|^) the derivatives of /,(., .) wrt the first (resp. the second) argument 
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and |^ the derivative of J(.) wrt its argument. We have uf 



dx \6,bi(fi) 



+ 



dk 



dbi\ 



dz \e,bi(0) 99 



%\bJ0)- However, because bi(9) maximizes hl(Yi; 9, b) we have 

m, fth, fit 

0. (5) 



dli dbi 



dJ_ 



Hence we obtain that 



U: 



dk 



(6) 



That is uf is simply the derivative of the loglikelihood as if b was fixed, 
computed in (9,bi(9)). 



%_ 

ae 



Next we have Hf 
equation ([5j wrt 9 we have: 

d 2 h , 



a 2 h 



+ 



d 2 h 



dbi 



dx 2 \0,bi(9) ^ dzdx\e,b z (S) ae 



). Differentiating 



d 2 l u 



dk 



d 2 J dbi 



from which we obtain: 
89 19 



d 2 J, 



q z 2 wmo) q z i 



Hence: 
Hf 



d 2 L , 



d 2 L 



dx 2\e,b t m dzdx WMe) 



d 2 U , 



d 2 J, 



Q z 2 \0,bi{$) Q z 2 \bi{6) 



d% 

dxdz le ' k ^' 



In practice we can plug in the estimator 9 T to obtain an estimator of 
£(#o") (using the continuous mapping theorem). We may also use the ob- 
served scores and Hessian. By virtue of the law of large numbers they 
converge toward their expectations, and again the continuous mapping the- 
orem allows to prove consistency of the resulting estimator. 



3.2 Correction of the bias 

We have shown in section 3.2 that the MHLE 9 T tends toward Oj" which 
is in general different from 9*; thus there is an asymptotic bias 9q —9*. 
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Note that the asymptotic distribution is valid for any r, and on the other 
hand, 9 T is biased even for r = r*. Thus the problem of this approach 
is essentially that of the bias, although a small bias may be acceptable if 
it goes with a small variance. We propose to partially correct the bias by 
parametric bootstrap (Efron and Tibshirani, 1993). 

Specifically, for s — 1, . . . , S, generate the bf from /&(., t); generate Yf 
from /y(.; 9 T , bf); compute the MHLE 9 T ' S for these data. An estimator 
of the bias is S^ 1 I]f=i(# r ' s — @ T )- Thus the corrected estimator, called 
cMHLE, is 

s 

e T = e T - s- l Y,^ T ' s -e T ). 

s=l 

This correction slightly increases the variance. The variance of 9 can be 
computed through the formula var Ep* (9 T \ 9 T ) +E P » var (9 T \9 T ). Neglecting 
the bias of the MHLE in this computation we obtain: 

var 9 T w (1 + 5 _1 )var 9 T 

4 PENALTY CHOICE 

Profile likelihood has been proposed by Therneau and Grambsch (2000) and 
Lee and Nelder (2001) but it has the drawback of requiring the computation 
of the marginal likelihood. We propose a strategy for penalty choice which 
avoids this computation. For any choice of r = (ti, . . . , tr) we have that 
9 T has an asymptotic normal distribution with expectation 9j and with 
a variance that can be estimated. We propose to take a reasonable upper 
bound of r, that is, the value t u = (r u , . . . , t u ) where r u is considered 
as an approximate upper bound for the r*. First, note that since we are 
working with natural logarithms of the biological parameters, the Tj may 
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be interpreted as coefficients of variation of these parameters. It seems 
reasonable (and is in agreement with the literature) to expect coefficients 
of variations of parameters such as rate of production of new lymphocytes 
(A) or death rate of uninfected lymphocytes (/it) are not very large, that 
is no more than 0.3. 



Newton-like algorithms use an approximation of the Hessian of the function 
to maximize. Since there are many parameters, this matrix can be very 
large. For instance in our application q = 7, R = 3, n = 100, so the number 
of parameters is q + nR = 307. In complex problems, both gradient and 
Hessian have to be computed numerically. Particular care must be spent 
to compute the Hessian both economically and precisely. The algorithm we 
propose is an adaptation of the Marquardt algorithm (Marquardt, 1963), 
taking advantage of the special structure of the Hessian in our problem. We 
draw two consequences of this special structure: (i) there are many terms 
which are equal to zero, so we do not need to compute them; (ii) the matrix 
is not far from being block-diagonal. 

We shall first consider the particular case where the number of ran- 
dom and fixed effects are equal (R = q) and the loglikelihood of subject 
i, l(Yi] 9, bi), depends only on 6 + 6«. We are interested in maximizing the 
following function: 



5 MAXIMIZATION ALGORITHM 



in a 

HL(0,6) = -£ l(Y t -6M)-Y, 




i=l r=l 



2r 2 
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It is useful to reparametrize in term of a, = 6 + hi. One finds 



HL 



n 



E 



IT - E 

r=l 



2r 2 



1 



n 



Ehl, 



With this parameterization the loglikelihood, = l(Yi]6,di — 6), which 
is the complex part, depends only on so that many derivatives of the 
h-loglikelihood are very simple: 



<9HL _ 1 " < - e r 



(7) 



<9 2 HL 



<9 2 HL 



0, if i ^ %' 



<9 2 HL 



T 



2 ' 



d6 r da % r , tit 2 ' da\da l l, ' ' d9 r d6 r > 

where <5 rr / = 1 if and only if r — r' . This leads to a specific block structure 



of the Hessian matrix, involving blocks A 



2 HL 



dBdai 



(where /r is the identity matrix of dimension R; D does not depend 
on i) and Ci = ^^r; the structure is displayed in Figure jlj 



A 


D 


D 




D 


D 


Gi 






D 




C 2 









o 




D 









Figure 1: Hessian matrix in the case R = q. A - 
Ir is the identity matrix of dimension R, and Ci 



-I R and D = — -I R , 



i a 2 hii 
n a? 



TIT" 
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Fast computation of this large (n + 1)R x (n + 1)R Hessian matrix is 
possible for two reasons: (i) only the terms of blocks Cj require computation 
of the likelihood; (ii) for computing the terms of block C, we only need to 
compute the second derivatives of If (and not of the whole h-likelihood). 
Finally there are nR(R + l)/2 terms to compute, each involving only one 
computation of the solution of the ODE system (needed for the numerical 
differentiation): it follows that the number of computations of the solution 
of ODE system does not exceed that required for the computation of the 
Hessian for an ordinary (without random effect) non-linear model with q 
parameters ! 

The nearly diagonal structure of the Hessian led us to design the so- 
called "patient-by-patient" algorithm, decoupling the optimization between 
patients. Denoting by ai(k) and 6(k) the values at iteration k, iteration k+1 
proceeds in two steps: 

Step 1: For % = 1, . . . ,n: make one Marquardt step for optimizing the 
function If — X^-Li ^ r ~^ k ^ on af, this gives ai{k + 1); Step 2: compute 
9 r (k + 1) = i Yh=i a r(k + 1) (which satisfies Q); go to step 1 (until con- 
vergence is reached). 

The patient-by-patient algorithm works very well far from the maximum 
when the global Marquardt algorithm is hampered by the need of a large 
increase of the diagonal of the Hessian. However the decoupling between 
patients also leads to a loss of efficiency so that close to the maximum 
it is less efficient than the global Marquardt algorithm. This observation 
led us to devise a hybrid algorithm: use the patient-by-patient algorithm 
until all blocks Cj are definite-positive; then switch to the global Marquardt 
algorithm. Note that ensuring that all blocks Cj be definite-positive does 
not imply that the Hessian is so; generally however it is not far from being 
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the case so that the Marquardt algorithm is efficient. 

We now consider the case where there are R fixed parameters, that we 
call a, associated with a random effect; such as above the loglikelihood of 
subject i, l(Yi\ 9, fej), depends only on a + bi and a vector of fixed parameters 
(3. As in the preceding case, the Hessian has a particular structure (see 
Figure |2l). It involves the blocks A, D and Cj as above, and in addition 



blocks B = 2?hl and B 



_ 1 d 2 hij 

1 n dftdcii ' 



a p a, 



A 





D 


D 




D 





B 


B, 


B 2 




B„ 


D 


Bi 


c, 






D 


B 2 




c 2 











o 




D 


B„ 






C„ 



Figure 2: Hessian matrix in fixed and random effects case where A = 
— -Ir, D = — -I R (where Ir is the identity matrix of dimension R), 

d _ <9 2 HL d _ 1 gghlj nT1 J _ 19 2 hli 
■° ~ a/3 2 ' _ n dpdcn dIla 0i _ n ~8^f 

The idea, like previously, is to deal with the case where there are non- 
definite positive Cj. For that, we use the two steps of the patient-by-patient 
approach which give the individual parameters at(k + 1) and their means 
a(k+l). Then keeping these values fixed, we find the other fixed parameters 
f3{k + 1) by a step of Marquardt algorithm with the block B. As soon as 
all blocks C{ and the block B are definite positive, we switch to the global 
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Marquardt algorithm. 



6 A SIMULATION STUDY 

6.1 Description of the simulation study 



We did simulations from the model described in section 2.3 We fixed (that 
is we did not estimate) the parameters jly, fir and <7j, at values which 
are plausible in view of the literature (taking as time unit the day and 
as volume unit the micro-liter): \xy = 3.40; jlx = —2.20; cr^ = 0.5, i = 
1,2,3. The values for the other parameters (to be estimated), including 
the two treatment effects (3% and 02, are given in Table 2. For each replica, 
observations for n = 100 subjects were generated; for each subject rii m = 10 
observations for the three compartments (m = 1, 2, 3) were generated at 
times 0, 3, 6, 9, 12, 15, 18, 21, 24, 30. 



6.2 Efficiency of the algorithm 

We did a simulation to compare the number of iterations of the global Mar- 
quardt algorithm and the hybrid algorithm. We tried the two algorithms 
with models including one to three random effects. The initial values were: 
A = 5.0; pL T * = 0; ^ = 0; 70 = -5.0; ft = -1.0; ft = -1.0. The global 
Marquardt algorithm did not always converge in less than 150 iterations 
while the hybrid algorithm nearly always converged (see Table [T]); when 
they both converged, this was toward the same values (close to the true 
parameter values). We checked that when we started from different values 
the algorithms converged toward the same values. The hybrid algorithm is 
faster than the global one. In Table[T]we give the mean number of iterations 



18 



Table 1: Percentage of convergence in less than 150 iterations and mean 
number of iterations to converge with different random effects for the Global 
Marquardt and Hybrid algorithm. 



Random effects 


Global algorithm 
nb iter % success 


Hybrid algorithm 
nb iter % success 


A 


35 


81% 


11 


100% 


A, Jit* 


54 


59% 


17 


100% 


A, fir*, 7T 


71 


49% 


25 


94% 



until convergence (computed on 100 replications) for which the algorithm 
converged in less than 150 iterations. For instance the mean number of 
iterations for 3 random effects was 25 versus 71 for the hybrid versus the 
global algorithm. The mean time of one iteration is about the same for the 
two algorithms. To give an idea in terms of computation time, the hybrid 
algorithm took about 10 mn for the case with three random effects on a 
standard work station (Bi Xeon, 3.8 GHz). 

6.3 Efficiency of the bias correction 

We estimated the bias of the corrected 9 T and uncorrected MHLE 6 T using 
500 replicas of a distribution with three random effects bearing on A, fix* , 
We first examine the case where r = r* = (0.2, 0.2, 0.2). The biases of the 
uncorrected MHLE are of order 10 -2 for all parameters. The correction 
reduces the biases to the order of 10~ 3 (except for one parameter), which 
seems negligible. 
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Table 2: Parameter values for the simulation and uncorrected and corrected 
biases 



Parameters 


True value 


Mean estimated value 


Bias 






non corr 


corr 


non corr 


corr 


A 


4.10 


4.14 


4.10 


4.03 10~ 2 


1.67 KT 3 


fir* 


-1.60 


-1.54 


-1.60 


5.57 10~ 2 


3.67 10- 3 


7T 


-0.170 


-0.160 


-0.166 


1.01 lO- 2 


3.69 10~ 3 


70 


-3.00 


-2.98 


-3.00 


1.50 10~ 2 


-3.60 10~ 3 


01 


-1.10 


-1.08 


-1.10 


2.04 10~ 2 


2.59 10~ 3 


02 


-1.40 


-1.35 


-1.39 


4.55 10- 2 


1.11 io- 2 



6.4 Property of the cMHLE 

We wished to check whether the asymptotic results hold in practice. We 
simulated data from the standard model of section 12.31 In the first simu- 
lation (case 1) we took as standard deviations of the random effects = 
t* t = r* Tt = 0.2. In a second simulation (case 2) we took = 0.1, 
t* t = 0.2, r* = 0.3. We did 500 replications and computed the root 
mean square errors (RMSE) and coverage rate of .95 confidence intervals 
of the estimated fixed parameters obtained in fixing the components of t u 
in the h-likelihood at values r u = 0.1; 0.2; 0.3. The results for the RMSE 
are shown in Table [3] In the first case, the results tend to be better when 
t u = 0.2 which is closer to the r*, while r" = 0.3 tends to be better than 
t u = 0.1. For the second case the results for r u = 0.2 and r u = 0.3 were 
approximately of the same quality, better than for r u — 0.1. It is striking 
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that most of the RMSE are roughly of the same order, between 10~ 2 and 
1CT 1 . These RMSE can be interpreted as typical relative errors on the nat- 
ural parameter; in these simulation the order of magnitude is about 5%. 
In term of coverage rates, the results (see Table [4]) are not very good for 
t u = 0.1. They are satisfactory for r u = 0.2 , and even more satisfactory 
for t u = 0.3. This corroborates our strategy based on a reasonable upper 
bound r u of the r*. 

Table 3: Root Mean Square Error, 100 subjects, 500 replications. 







0.1 






0.2 






0.3 




Par. 


case 1 




case 2 




case 1 




case 2 




case 1 




case 2 




A 


5.62 10- 


-2 


4.47 10- 


-2 


3.34 10" 


-2 


3.94 10" 


-2 


4.95 10- 


-2 


3.76 10- 


2 


[L T * 


9.10 10- 


-2 


7.67 10- 


-2 


4.64 10" 


-2 


6.84 10- 


-2 


8.50 10- 


-2 


7.47 10- 


2 


7T 


7.27 10- 


-2 


5.95 10- 


-2 


5.14 10" 


-2 


5.86 10- 


-2 


5.25 10- 


-2 


5.71 10- 


2 


70 


2.60 10- 


-1 


1.88 10- 


-1 


1.35 10- 


-2 


1.56 10- 


-1 


1.77 10- 


-1 


1.52 10- 


-1 


ft 


1.74 10- 


-1 


1.59 10- 


-1 


1.01 io- 


-1 


1.04 10- 


1 


1.04 10" 


-1 


9.89 10- 


2 


fa 


1.67 10- 


-1 


1.91 10- 


-1 


1.02 10- 


-1 


1.35 10- 


-1 


1.06 10- 


-1 


9.90 10- 


2 
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Table 4: Coverage rate, 100 subjects, 500 replications. 



r 


n 1 
U.l 




U.Z 




0.3 




T~) 

rar. 


case 1 < 


:ase 2 


case 1 < 


:ase 2 


case 1 case 2 


A 


90% 


81% 


95% 


92% 


89% 


89% 


[It* 


94% 


81% 


94% 


88% 


93% 


89% 


7T 


96% 


95% 


94% 


92% 


93% 


94% 


70 


97% 


97% 


98% 


96% 


94% 


95% 




94% 


87% 


97% 


93% 


93% 


94% 


ft 


85% 


74% 


97% 


93% 


92% 


93% 



7 APPLICATION TO A CLINICAL TRIAL 

As an application of the proposed method, we aimed at estimating the 
difference of treatment effects in a randomized clinical trial (Molina et al., 
1999). The ALBI ANRS 070 trial compared over 24 weeks the combination 
of zidovudine plus lamivudine (AZT+3TC) with that of stavudine plus 
didanosine (ddI+d4T) (a third arm alternating from one regimen to another 
was not considered in this paper). The inclusion criteria were CD4 > 200 
cells///!/ and HIV RNA level between 4 and 5 logw copies/mL within 15 
days before entry into the study. The primary outcome measure defined in 
the study protocol was the antiretroviral effect as measured by the mean 
change in HIV RNA level between baseline and 24 weeks by use of the 
ultra-sensitive PCR assay with lower limit of quantification of 50 copies/mL 
(1.7 logio). In the main analysis of Molina et al. (1999), HIV RNA values 
reported as < 50 copies/mL were considered equivalent to 50 copies/mL; 51 
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patients were included in each treatment group. Over the 24-week period, 
HIV RNA level declined in the two groups, with mean (SE) decreases at 
the end of the study of 1.26 (0.09) logio copies/mL in the AZT+3TC group 
and 2.26 (0.11) log w copies/mL in the ddI+d4T group. 

We used the model described in section 2.3. In this application only 
the first two components Y^i and Y^2 were observed. Moreover only a 
left-censored version of Y^i was observed; this was taken into account in 
the likelihood as in Guedj, Thiebaut and Commenges (2007). In view of 
less informative observations than in the simulations we fixed the values of 
three parameters: \xt = —2.20, fly = 3.40 and 70 = —3. We put random 
effects on A, ir and /i^, working with r u = 0.3. The estimated values of the 
parameters in natural logarithmic scale are displayed in Table 5. Reverting 
to natural parameters we find: A = 56.8 [54.1; 59.7]; n = 0.79 [0.67; 0.92]; 
fi T * = 0.18 [0.17; 0.19]. In addition it was possible to test whether the two 
treatment groups differed. The relevant null hypothesis is "77 = 0", where 
77 = — Pi- A natural test statistic is W = 7== , where fj = f3 2 — f3i and 

V var i) 

va? fj can easily be computed from the estimate of the asymptotic variance 
matrix E. 

We found fj = 0.242, var fj = 5.16 10~ 3 ; this gives W = 3.37 and a 
p-value equal to p = 7 10~ 4 . Thus we conclude as expected that the treat- 
ment groups differ, and more precisely that the infectivity of the virus has 
been reduced more drastically in the ddI+d4T than in AZT+3TC group. 
Baseline infectivity is multiplied by a factor estimated to e^ 2 = 0.25 and 
e^ 1 = 0.32 in the ddI+d4T than in AZT+3TC groups respectively. 



23 



Table 5: Estimated parameters based on the ALBI clinical trial 



1 dbL dlilt: LCI O 


Tin PHrroft t»ri \/ q In oc 


( nrropton \/ Q In oc 
V^yvJl 1 ct LcU. V dl Ut:o 


vjUllllLlcllL^t; llltcl Veil 


A 


4.05 


4.04 


[3.99; 4.09] 


7T 


-0.129 


-0.242 


[-0.401; -0.083] 




-1.74 


-1.73 


[-1.80; -1.65] 


Pi 


-1.33 


-1.12 


[-1.29; -0.957] 




-1.53 


-1.37 


[-1.56; -1.17] 


&CD4 


0.173 


0.168 


[0.151; 0.185] 


&CV 


0.584 


0.541 


[0.501; 0.582] 



8 CONCLUSION 

We have developed a hierarchical likelihood approach for inference in an 
HIV dynamical model. We have obtained the asymptotic distribution of the 
MHLE, we have derived a procedure which makes the bias negligible and 
we have developed an efficient maximization algorithm. Our simulations 
show that the whole approach works. 

We have shown that it could be applied to the analysis of a real data 
set. Rather precise estimates of the parameters were obtained. One limi- 
tation of this approach is that some parameters must be fixed because of 
identifiability problems. The model itself, although it is already statisti- 
cally challenging, may be too simple from a biological point of view. The 
development of such an approach would require richer data, for instance 
observing the number of infected T cells. 
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The main advantage of this approach is that it is easy to implement and 
very fast as compared to the two main competing approaches, likelihood 
and Bayesian inference. The main limitation is that it does not attempt to 
estimate the variances of the random effects. In our application we already 
have a knowledge of the range of values of these variances. Thus the method 
can be used for exploring possible models while likelihood or Bayesian in- 
ference can be used when estimates of the variances of the random effects 
are needed. 

Acknowledgments. The authors thank the investigators of the ALBI 
ANRS-070 clinical trial and particularly J. M. Molina (principal investiga- 
tor) and G. Chene (methodologist). 
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